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The maximum entropy method is shown to be a special limit of the stochastic analytic continuation 
method introduced by Sandvik [Phys. Rev. B 57, 10287 (1998)]. We employ a mapping between the 
analytic continuation problem and a system of interacting classical fields. The Hamiltonian of this 
system is chosen such that the determination of its ground state field configuration corresponds to 
an unregularized inversion of the analytic continuation input data. The regularization is effected by 
performing a thermal average over the field configurations at a small fictitious temperature using 
Monte Carlo sampling. We prove that the maximum entropy method, the currently accepted state 
of the art, is simply the mean field limit of this fully dynamical procedure. We also describe a 
technical innovation: we suggest that a parallel tempering algorithm leads to better traversal of the 
phase space and makes it easy to identify the critical value of the regularization temperature. 



I. INTRODUCTION 

Wick rotation transforms imaginary time correlation 
functions into real, measurable response functions. Ana- 
lytical results, or numerical results fit to a known func- 
tional form, allow for a simple substitution of variables: 
e.g., —it i ► t{l + i0 + ). In general, however, this is not 
possible. To interpret the results of computer simulations 
such as quantum Monte Carlo and to make comparisons 
with experiment, we require a technique that reliably ex- 
tracts spectral information from imaginary time data. At 
issue is how best to do this given that the input data is 
intrinsically noisy and incomplete. 

The most widely used technique is the maximum en- 
tropy method (MEM)?ii2i 3 . which selects the best can- 
didate solution that is consistent with the data. Here, 
"best" means most likely in the Bayesian sense. There 
are several variations on the algorithm, but in general 
it plays out as a competition between the goodness- 
of-fit measure x 2 an d the entropic prior S. In prac- 
tice, one minimizes the functional \ 2 — a~ x S (for some 
a -1 =^ 0). The presence of the entropic prior introduces 
a non-linearity that pulls the minimum away from the 
least squares solution. One of the key advantages to the 
method is that it is rigourously derived from statistical 
considerations and guarantees a unique solution. 

Another strategy is to generate a sequence of possible 
solutions and then take their mean, with the hope that 
spurious features will be averaged out and legitimate fea- 
tures reinforced (as, e.g., in Ref.Q). Such methods, how- 
ever, tend to be ad hoc and are not rigourously justified. 
There are no criteria for selecting which solutions to in- 
clude or for assigning their relative weights in the sum. 
Moreover, how these schemes are related to the MEM 
solution is unclear. There is no reason a priori to be- 
lieve that an average over several possible spectra will be 
closer to the true spectrum than the single most probable 
one. 

Nonetheless, there is compelling evidence that averag- 
ing methods can produce better spectra than the MEM. 



In particular, Sandvik^ has shown that an unbiased ther- 
mal average of all possible spectra, Boltzmann weighted 
according to x 2 , produces (in several test cases) an aver- 
age spectrum that is in better agreement with the true 
spectrum (found via exact diagonalization) than is the 
MEM result. Indeed, our own experience suggests that 
the MEM is unduly biased toward smooth solutions: 
sharp spectral features tend to be washed out or obliter- 
ated. 

In this paper, we show how the averaging approach can 
be made systematic. We relate the analytic continuation 
problem to a system of interacting classical fields living 
on the unit interval and prove that the MEM solution is 
realized as its mean field configuration. From that point 
of view, Sandvik's method amounts to allowing thermal 
fluctuations about this mean field configuration. It is, in 
some sense, the most natural dynamical generalization of 
the MEM. Finally, we sketch out an improved algorithm 
for performing the stochastic sampling and provide test 
results for the two methods applied to the spectrum of a 
simple BCS superconductor. 

II. ANALYTIC CONTINUATION 

A dynamical correlation function of imaginary time, 
G(r) = (T[O(r)d t (0)]), satisfies the (anti-)periodicity 
relation G(t + (3) = ^G(r), where the upper sign holds 
for fermionic operators and the lower sign for bosonic 
ones. Since it is uniquely determined by its values in the 
region r <E [0, /3), the function admits a discrete Fourier 
transform 

G{r) = -Y j e-^ T G{u Jn ), (1) 

GK) = / dre^ r G{r), (2) 
Jo 

where the sum is over the Matsubara frequencies uj n = 
(2n + l)7r//3 for fermions and w„ = 2mr/P for bosons, 
with n € Z. 
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Provided that |G(w„) falls off at least as fast as l/|w„| 
when n — > oo (which is guaranteed so long as the opera- 
tor (anti-)commutator satisfies (OO^ ±0^0) < oo), the 
Fourier components are representable in terms of a func- 
tion of the form 



G[z) 



duo p(tu) 
, 2n z — LU 



(3) 



with the identification G(u> n ) = Q(iuj n ). The function 
p(uj) is real-valued and satisfies p(uj) > for fermions 
and sgn(oj)p(oj) > for bosons. Note that G{z) is ana- 
lytic everywhere in the complex plane, with the possible 
exception of the real line. Wherever p(u) is nonzero, 
there will be a corresponding jump in Q(z): 



G(uj + z0+) - G(uj - i0+) = ±p(oj). 



(4) 



The principle of analytic continuation states that given 
the value of G(z) at a countably infinite number of points 
along the imaginary axis — by which we mean that G{oj n ) 
or, equivalently, G(r) is known — we can uniquely extend 
G(z) from those points to the full complex plane. In par- 
ticular, we can find its values just above and just below 
the real axis and hence, via Eq. @, extract p(ui). 

According to Eq. ©, we can write 



T 



dto p{uj) 
2ir iui n — u> 



(5) 



Transforming back to imaginary time, via Eq. Q), and 
performing the Matsubara frequency sum yields 



G(r) 



dw 1 >r-^ e 

dcu 



iu> n — Ul 

t pW) 



-p(uj) 



2tt e-P" ± 1 
du> K(t,oj)A(uj). 



(6) 



In the last line, we have defined 

e~ wr /{e~ 0w + 1) fermions 



and 



A{w) 



oje~ UT /{e-^ - 1) bosons 



p(oj)/2ir fermions 
p(oj)/2ttuj bosons. 



(7) 



(8) 



(For some applications it may be more appropriate to 
define K(t,uj) = and A(ui) = p(w)/2 7 r(e~' 3w - 1) in 
the bosonic case.) The spectral function A(oj), which we 
shall view as the main quanitity of interest, is positive 
definite and satisfies a sum rule J dio A(lo) = M < oo. 

Equation © tells us that we can interpret G(r) as 
a linear functional of A(ui) with kernel K(t,uj). Hence, 
the analytic continutation is equivalent to the functional 



inversion A(uo) = K _1 [G(t)]. Only a finite inversion 
is practicable, however. If we discretize frequency and 
imaginary time using a uniform mesh (with spacings At 
and Aw), then Aj = A(Auj ■ j)Au> and G k = G(Ar • k) 
are related by Aj = ^ k Kj^Gk- The problem is thus 
reduced to a matrix inversion of 



gAwA-r-j-fc 



(9) 



This inversion is not an easy one to perform, how- 
ever. The condition number of Kjk is extremely large: 
the matrix will have eigenvalues both exponentially large 
and exponentially small in (3. This means that compu- 
tation of the inversion requires extremely high numerical 
precision^ Worse, the inversion problem is ill-posed and 
responds badly to any measurement error in the input 
set Gk- The inversion typically overfits the noise with 
spurious high-frequency modes in Aj. 

The history of practical analytic continuation methods 
is one of continual refinement of the procedures for regu- 
larization of the matrix inversion. The simplest example 
of regularization is to try 



A, =Y J {K k] +\5 k] )- 1 G k . 



(10) 



Since the high-frequency modes in Aj are generated by 
the smallest eigenvalues of Kj k , a nonzero value of A will 
have the effect of suppressing those modes with eigen- 
values on the order or A or smaller. To see this, note 
that for each eigenvalue E of Kj k , there is an eigen- 
value in the inverse matrix that is modified according 
to l/.E-> l/{E + \). 

This naive scheme has two major flaws. First, filtering 
out the high frequency modes in this way has the effect of 
eliminating from the spectral function all fine structure 
below a certain frequency scale, whether spurious or real. 
Second, it does not ensure that Aj > 0, as required. The 
MEM, which we describe briefly in the next section, is 
considerably more sophisticated about what to filter and 
has nonnegativity built in. 



III. MAXIMUM ENTROPY METHOD 

Suppose that to the exact function G(r) we have a 
measured approximation G(r). In practice, this will usu- 
ally have been generated from some Monte Carlo simu- 
lation, so that 



G(t) = G(r) + statistical noise. 
The goodness-of-fit functional 



X 2 [A] 







dr 



o <rf 



dujK(T,uj)A(uj)-G{T) 



(11) 



(12) 



measures how closely the correlation function generated 
from A(uj) [via Eq. 10, the forward model] matches G(r). 
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Here, <t(t) is the best-guess estimate of the total measure- 
ment error in G(r). (See Appendix [SJ) There is also an 
entropy associated with each spectral function, 



S[A] 



dujA{u) ln(A(w)/D(w)) 



(13) 



which measures the information content of A{u>). Here, 
D(u) is the so-called default model, a smooth function 
that serves as the zero (maximum) entropy configura- 
tion. Any features of the true spectral function known in 
advance can be encoded in D(uj). 

It can be shown that the likelihood of any A(u>) be- 
ing the true spectral function is equal to V[A] ~ e~®^ 
where Q — x 2 — a^ 1 S (and a -1 is a parameter that con- 
trols the degree of regularization). The MEM solution 
corresponds to the spectral function that minimizes Q. In 
practice, the minimization of Q is treated as a numerical 
optimization problem and is typically performed using 
the Newton-Raphson algorithm or some other gradient 
search technique. Nonetheless, a formal solution can be 
found by identifying the spectral function for which Q 
is stationary with respect to functional variation. The 
result, derived in Appendix iBl is 



A(lo) = e afl D{uj) exp 



-2a 
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dr 



cr(r) ; 



where 



dujK(T,uj)A(uj) - G(t) 



(14) 



(15) 



and is a Lagrange multiplier chosen to enforce the nor- 
malization J dio A(u>) = J\f . 

In two trivial limits, this set of equations can be 
solved exactly. When a — > oo, Eq. I|14(l demands that 
ip — > 0. This yields the noisy, unregularized spectrum 
A(lj) = K _1 [G(t)], which is the solution that minimizes 
X 2 [A] . When a —> 0, A(w) = D(w), the smooth default 
function. This solution maximizes S [A]. Note that these 
results come about because Q ~ x 2 [A] an d Q ~ — S[A], 
respectively, in the two limits. 

Over the full range of intermediate values (0 < a < 
oo), Eq. I|14|) constitutes a one-parameter family of solu- 
tions interpolating between these two extremes. An addi- 
tional condition must be imposed to remove this ambigu- 
ity, i.e., to turn the family of solutions into a single final 
spectrum. In classic MEM, one takes the point of view 
that somewhere between over-fitting and over-smoothing 
lies an ideal intermediate range centred on some optimal 
value of a. In other schemes, the final result is produced 
by averaging, A(ui) = L daw(a)A(a,uj)/ J Q daw(a), 
in which case the question becomes which weighting func- 
tion w(a) to use. In their definitive review^ Jarrel and 
Gubernatis address these issues in greater detail. 



IV. THE STOCHASTIC APPROACH 

In this section and the next, we introduce the stochas- 
tic analytic continuation approach and demonstrate how 



it is related to the MEM. To start, consider a smooth 
mapping <f> '■ K M [0, 1] , which takes the frequency do- 
main of the spectral function onto the unit interval. Such 
a function will be of the form 



|»'4 



duD{u) 



(16) 



where D = N4>' is positive definite and (like A) normal- 
ized to N but otherwise arbitrary. (We use the notation 
D for the mapping's kernel in anticipation of identifying 
it with the default model of the MEM.) Then, 



1 



1 



duj A(uj) = / d(f)(ui) 



A(u) 



dxn(x). (17) 



D(u) jo 

In the last line, we have made the change of variables 
x = 4>{ijj) and introduced the dimensionless field 

A(4>-H*)) 



<X) D(^{x)) 

which, according to Eq. |17jl. is normalized to unity 
Under this change of variables, Eq. (|12f> becomes 



(18) 



H[n(x)} = 



' dr 
a{rf 



dx K(T,x)n(x) — G(r) 



(19) 



with K(t, 4>(uj)) = K(r,u>). We take the point of view 
that Eq. I|19l) is the Hamiltonian for the system of clas- 
sical fields {n(x)}. Then, supposing the system is held 
fixed at a fictitious inverse temperature a, it has a par- 
tition function Z = J T>n e~ aH with a measure of inte- 
gration 



Jvn^J^ (j[dn(x)j6^ dxn(x) - 1 

The thermally averaged value of the field is 
(n(x)) = i j Vnn(x)e~ aH[n] . 



(20) 



(21) 



The corresponding "thermally regulated" spectral func- 
tion, 

(A(u)) = WH))D( U ), (22) 

can be recovered using Eq. (|18(l . 

At zero temperature (a — > oo), Eq. (|21|l simply picks 
out the ground-state field configuration; the correspond- 
ing spectral function is the unregularized analytic con- 
tinuation result. In the high temperature limit (a — ► 0), 
Eq. (|21|l represents an unweighted average over all pos- 
sible field configurations. In that case, the average 
is completely independent of the input function G(t) 
and as such can only yield the zero-information result 
(n(x)) = 1. From Eq. l(T%)l. it follows that D(oj) is the 
corresponding spectral function. 

These limits are precisely those of the MEM, which we 
discussed at the end of Sect. IIIII Note that the kernel 
of the mapping in Eq. i|16|) plays the same role as the 
MEM's default model and the fictitious temperature the 
same role as the MEM's regularization parameter. 
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V. APPROXIMATE SOLUTIONS 

Now let us extend our "interacting classical field" anal- 
ogy a little further. Expanding the square in Eq. <|19[) . 
we can cast the Hamiltonian in the familiar form 



H[n(x)] = / dxe(x)n(x) 
Jo 

1 f 1 

+ -/ dxdyV(x,y)n(x)n(y), (23) 



with a free dispersion 



e(x) = -2 f *L-G(t)K{t,x) 
jo a \ T ) 



(24) 



and an interaction term 



V(x,y) = V(y,x) 



dr 
o p(t-)'' 



K{r,x)K{r,y). (25) 



Noninteracting system — Let us ignore the interaction 
term for a moment and proceed by setting V — 0. Then, 
if we represent the delta function constraint in Eq. H2()|l 
with an integral representation 



/oo 
d( exp(i(X), 
-OO 



(26) 



the partition function is simply Z — d£e 1< >Z(£), 
where 

Z(C) = (j[ dn(x)j e" /o *" (<"(»)-<0»(*). (27) 



The saddle point solution for the field is 
S ( 1 



n(x) 



5e(x) \ a 



■InZ(C) 



a(e(x)-/i) 



(28) 



This says that the fields are Maxwell-Boltzmann dis- 
tributed according to their energy as measured with re- 
spect to a chemical potential /i = iC,/u, which is chosen 

such that J^dxn(x) = 1. 

Mean field treatment — Now let us reintroduce V. As- 
suming that fluctuations of the n(x) field about its mean 
value are negligible, 



(n(x) - n(x)) (n(y) - n(y)) ks 0, 
the Hamiltonian has a mean field form 



Hmf = I dx E{x)n{x) + const., 
'a 



(29) 



(30) 



where 



E(x) 



SH\ 



Sn(x) 



e{x)+ dyV{x,y)h(y). (31) 



Equation l|30[) leads to the saddle point solution given 
by Eq. 128|) but now with e(x) replaced by E(x). Using 
the definition of E{x) from Eq. 1)31 [I. we arrive at the 
self-consistent equation 



n(x) = e a/i exp 



~a(e(x)+ dyV(x,y)n(y)^ 



(32) 



Again, fi is a chemical potential used to fix the normal- 
ization. 

Now consider the reverse change of variables taking 
n(x) back to A(lu). With only a little effort, one can 
show that Eq. is identical to Eq. {H}. What this 
tells us is that the mean field treatment of the classical 
field system is formally equivalent to the MEM. 

We can make this equivalence more explicit still. The 
free energy density of the system we have just described 
is F = U — a~ 1 S — /i, where the internal energy is given 
by U — H[n(x)] and the entropy (see Appendix by 



S[f 



dx n(x) In n(x). 



(33) 



As we saw earlier, Eqs. (|12|) and H19|) are connected by 
a change of variables. Similarly, 



S[n] 



dx n(x) In n(x) 



= - id^)4^iJ A{ " ] 



duo A(uS) In 



£>(u;) \D{uj) 
A(uj 



(34) 



= S[A], 



where the final equality follows from comparison with 
Eq. Thus, x 2 = H[n{x)] and S = S[n(x)], which 

makes clear that FAf = Q = x 2 — — fjj\f. This 

means that the MEM solution is just the one that mini- 
mizes the free energy of the {n(x)} system at the mean 
field level. 



VI. MONTE CARLO EVALUTAION 

A. Configurations and Update Scheme 

The energy of a given field configuration, given by 
Eq. 119|) . can be written in the form 



H[n(x)} = / dTh(r) 2 , 
Jo 



(35) 



where 



h(r) = -. . / dx K(t, x)n(x) — g(r) (36) 
Jo 

and g(r) = G(t)/ct(t) is the input Green's function 
rescaled by the variance. 
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n c <x) 



a x 



FIG. 1: A field configuration of delta functions nc(x) is spec- 
ified by a set C = {r 7 , a 7 } of residues and coordinates. 



Computing (n(x)) requires that we integrate over all 
possible field configurations. To accomplish this, we need 
some ansatz to render the measure T>n finite. One choice 
is to represent each field configuration as a superposition 
of delta functions. In that case, we can parameterize each 
configuration by a set of residues and coordinates C — 
{r 7 ,et 7 } satisfying r 7 > 0, < a 7 < 1, and ^ 7 r 7 = 1. 
The corresponding field configuration is 



Tic 



(x) = r 7 5(x — a 7 ). 



(37) 



The partition function Z = J dC c-xjp(-aHe) has a new 
computationally tractable measure 



(38) 



In order to calculate the energy He of a given config- 
uration via Eq. (|35|l , we shall need the relation 



9(t) + h c {r) 



°{ T ) Jo 



dx K{t, x)nc{x) 



— J2r-yK(T, a 7 ). 



(39) 



Now suppose that the configuration is modified (C i— » C") 
by altering the parameters in some subset A of the delta 
function walkers: 



r 7 = r 7 



a 



7 



a 



7 



^2 S 7 \Ar x , 
AeA 

^ S 7 \Aa\. 



(40) 



AeA 

Accordingly, he i— » he = he + Ah, where 



AeA 



(41) 



The configuration energy changes to 



The Monte Carlo procedure is to calculate He and 
hc(r) for some arbitrary starting configuation C and 
then update them whenever a walk is accepted. Ac- 
ceptance is determined according to the usual Metropo- 
lis algorithm: create a modified trial configuration and 
compute its energy shift AH = He 1 — He following 
Eq. (|42[1 : choose a random real number £ G [0,1]; if 
exp(— aAH) > £, accept the walk and update 



H c i * H C > =H C + AH, 
he i— > foe = + A/i. 



(43) 



The path of the delta function walkers through the con- 
figuration space must be normalization-conserving and 
must satisfy detailed balance. Moreover, the entire phase 
space must, in principle, be accessible. Only two types of 
moves are necessary to meet these criteria: (1) coordinate 
shifting moves, in which the walker A is translated by a 
distance Aa\, and (2) weight sharing moves, in which 
the total residue of a subset of walkers is reapportioned 



amongst themselves such that 



1 is preserved. 



It is useful, however, to introduce additional weight 
sharing moves that also conserve higher moments 



f din(xK=Vr 7 £ 7 ". 
Jo 



(44) 



Sandvik has shown that such moves dramatically improve 
the acceptance ratio of attempted walks at low tempera- 
ture. At a minimum we want to consider walks that pre- 
serve the overall normalization Ai^ = 1. But we also 
consider rearrangements of weight between n > 2 walkers 
that conserve the first n—1 moments. Such a move can be 
effected as follows. Let A = {Ai, A2, . . . , A n } = {Ai} U A. 
Defining the scale factors 

(-1 ifA = Ai 

Qx = fr^' if AeA, (45) 



we can express the changes in residue as 
r' x = r x + Ar x = r x - s Qx, 



(46) 



where s parameterizes the 1-dimensional line of con- 
straint through the n-dimensional space of residues. In 
order to preserve the positivity of the residues, we must 
impose r' x > 0. Hence, we need to ensure that rx > 
QxArx for all A £ A. Accordingly, we take s to be ran- 
domly distributed in the interval 

max(r A /QA) < s < min (rx/Qx), (47) 
AeA- AeA+ 

where A" = {A : Qx < 0} and A+ = {A : Q A > 0}. 



H C > 



dr (h c {T) + Ah(r)Y 
-0 



H c + / dr Ah(T)[2h c (r) + Ah(r) 
Jo 



(42) 



B. Parallel Tempering 

The Monte Carlo algorithm described above can be 
improved by introducing parallel tempering. 8 The idea 
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FIG. 2: The acceptance ratio of configuration swaps between 
adjacent levels (a p «-> a p +i) evolves as a function of the num- 
ber of updates performed. When the system is fully thermal- 
ized, the acceptance ratios stabilize to asymptotic values. 



is to allow multiple instantiations of the simulation 
to proceed simultaneously for a variety of parameters 
{ao, oli, . . . , o>n} covering a large range of inverse temper- 
atures. The temperature profile is arbitrary, but we shall 
find it convenient to choose a constant ratio a p+ i/a p = R 
between one temperature layer and the next. 

Most important, the field configurations in each layer 
are made to evolve in parallel but not independently. 
Configurations are swapped between adjacent layers in 
such a way that preserves detailed balance and ensures 
that each layer p will eventually settle into thermal equi- 
librium at inverse temperature a p . The update rule is 
quite simple: given two adjacent layers p and q — p ± 1, 
choose a random real number £ £ [0,1] and swap the p 
and q configurations if 



exp 



{a p - a q )(H p - H q 



(48) 



Parallel tempering eliminates the need for a separate, 
initial annealing staged Because the simulation simulta- 
neously samples over a large temperature range, there 
is no danger of getting trapped in false minima: the in- 
tcrlayer walks always provide a cheap pathway between 
configurations separated by large energy barriers. All 
that is required is to let the system thermalize for some 
time before sampling (i.e., before actually beginning to 
bin and tabulate the field configurations). By tracking 
the average acceptance rates for swaps between layers, 
it is straightforward to determine when the system has 
equilibrated. Figure[21shows a sample run (for a test case 
to be described in Scct. lVIlT|) . We see that on a stochas- 
tic time-scale of several tens of thousands of moves, each 
temperature layer settles into thermal equilibrium. 

An additional advantage of the parallel tempering algo- 
rithm is that it yields in one run a complete temperature 
profile of all the important thermodynamic variables. In 



In U(a p ) 




10 20 30 40 50 60 
P 



FIG. 3: The internal energy of the {n(x)} system at each tem- 
perature layer is plotted. The knee at p — p* , corresponding 
to a jump in the specific heat, signals a thermodynamic phase 
transition. 



the next section, we discuss how we can put that infor- 
mation to use. 



VII. CRITICAL TEMPERATURE 

The Monte Carlo simulation yields a set of thermally 
averaged field configurations {(n(x)) ap : p = 0, 1, . . . N}. 
With little additional effort, we can also keep track of the 
internal energies {U(a p ) = (H[n]) ap : p = 0, 1, . . . N}. In 
this section, we propose a final candidate spectral func- 
tion constructed from only these quantities. 

To start, note that the specific heat can be written as 



C(a p ) 



dU(a) 



dia- 1 ) 



a p U(a p ) d\nU(a p ) 



InR 



dp 



(49) 



(See Appendix IdI) In Fig.0 lnU(a p ) is plotted for each 
temperature level. The knee in the function, occurring 
in the vicinity of the level p — p* , indicates there is a 
jump in the specific heat. At low temperatures (a > 
a* = a p *), the system freezes out and the correlations 
(n(x)n(y)) — (n(x)}(n(y)) become short-ranged. There is 
a characteristic energy scale E* = U(a*) associated with 
this phase transition. 

Recall that in the microcanonical ensemble, the aver- 
age over all configurations having energy E is given by 



(n{x)) E = / Vnn(x)5{E ~ H[n}). 



(50) 
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FIG. 4: The stochastic analytic continuation method is used 
to extract the spectrum of a BCS superconductor (bandwidth 
W — 6 and gap 2A = 1) from noisy data. The grey region in- 
dicates the statistical uncertainty of the computed spectrum. 
The inset shows the classic and Bryan MEM results. 



We propose that the final spectrum be defined as 



«n(z)» 



1 

~E* 



dE (n(x))! 



(51) 



which sums over all field configurations in the ordered 
phase (i.e., configurations with energies E satisfying < 
E < E* ). Roughly speaking, this amounts to performing 
an unbiased average over all spectral functions A that 
surpass the fitting threshold x 2 [A] < E* . 

Since the Monte Carlo simulation is performed at fixed 
temperature, however, we must make the change of vari- 
ables dE = (dU /da)da. Equation (f5T|) becomes 

«»(*)>> =^£°**(-g) («) 

The discretized version of this integral is 

Y,p=p* (U(a p ) - U(a p+1 )) {n(x)) ap 



«"(*)» = 



U(a p .)-U(a N ) 



(53) 



VIII. BCS TEST CASE 



We showed in Sect.EJthat the stochastic analytic con- 
tinuation method is a dynamical generalization of the 
MEM. The question remains, What is gained by going be- 
yond the mean field calculation? Our contention is that 
the stochastic method is better able to resolve sharp spec- 
tral features buried in noisy data. To illustrate this point, 



we have taken the spectrum of a BCS superconductor — 
which contains flat regions, steep peaks, and sharp gap 
edges — as a test case. The exact spectral function is 



M 



if A < \uj\ < W/2 
otherwise, 



(54) 



where W is the bandwidth and 2A the gap magnitude. 

From Eq. (|54|l we generated an exact G(t) using the 
forward model. We then applied random error to the 
function to create an approximate G(r), which was made 
to serve as the input data for our stochastic algorithm 
and for two flavours of the MEM — the classic method 
and a method due to Bryant (both described in Ref. 0). 
Figure 01 shows these computed spectra alongside the ex- 
act result. 

The most striking aspect of the comparison is that the 
stochastically generated spectrum does a superior job of 
modelling the gap. It closely follows the trough of the 
gap and captures some of the sharpness of the peaks at 
the gap edges. The MEM spectra, on the other hand, 
are much too smooth. The classic MEM spectrum is 
especially poor. It is at best a caricature of the true BCS 
spectrum: the sharp features are completely washed out 
and the depression at w = is not a fully developed gap. 

Bryan's algorithm does a somewhat better job of repro- 
ducing the gap and its adjacent peaks, but in doing so it 
also forms a second pair of spurious humps around uj = 2. 
In our experience, this is typical behaviour. The MEM 
method has trouble making sudden transitions from re- 
gions of high to low curvature. What one tends to get is 
a smooth curve gently oscillating around the correct re- 
sult. The stochastic method, in contrast, seems to have 
no trouble generating a flat region next to a sharp peak. 



IX. CONCLUSIONS 

In this paper, we have made the case that the MEM 
is not the best method for extracting spectral informa- 
tion from imaginary time data. Instead, we advocate the 
use of the stochastic analytic continuation method. Our 
claim is that the stochastic method is at least as good as 
the MEM and may even surpass it for a broad class of 
problems in which the spectrum to be extracted has very 
sharp features. 

This is a difficult point to argue convincingly. New 
analytic continuation methods tend to face considerable 
resistance, and claims of superiority on their behalf are 
met (quite rightly) with a high degree of skepticism. The 
MEM has a record of years of successful use in a variety 
of settings; plus, it offers the comfort of a seemingly rock- 
solid mathematical rationale. Competing schemes tend 
to lack any clear justification other than a few tantalizing 
examples of their performance in a handful of test cases. 

The prevailing opinion is that the MEM is the defini- 
tive "solution" to the analytic continuation problem. 
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Some other method may produce better spectra in par- 
ticular special cases, but as a general method, the MEM 
has to win out. The thinking goes: no other algorithm 
can outperform the MEM because its solution is, by con- 
struction, the unique, best candidate spectrum — a claim 
that rests on the firm foundation of Baycsian logic. 

What this line of reasoning misses, however, is the pos- 
sibility that an average of many likely candidates might 
better reproduce the true spectral function than does the 
single most likely spectrum. To give a path integral anal- 
ogy, we would argue that including fluctuations about a 
saddle point solution (the single most likely field con- 
figuration) can yield a result closer to the full integral. 
This is how we go about justifying the stochastic analytic 
continuation method. 

Let us be careful about what can be established 
rigourously. To be precise, the standard conditional 
probability analysis used to derive the MEM proves 
only that the most likely spectrum belongs to the fam- 
ily of solutions (parameterized by a -1 ) that minimizes 
Q = x 2 — a~ 1 S. From our point of view, then, what 
is required of an averaging method is that it produce at 
the mean field level a family of solutions that coincides 
with the MEM result. The stochastic method, as we have 
formulated it, does exactly this — under the guise of mini- 
mizing the free energy FN (= Q) of a system of classical 
fields at a fictitious temperature or 1 . 

This correspondence gives us a new way of thinking 
about the MEM solution. We know that even though 
a path integral contains jagged, discontinuous field con- 
figurations, its saddle point solution is always a smooth, 
continuous function. This highlights the main deficiency 
of the MEM — that it fails to model well spectral func- 
tions that are not sufficiently smooth — and makes clear 
why the stochastic method does not suffer from the same 
limitation. 

Another advantage of the stochastic approach is that 
it helps us to talk about the analytic continuation prob- 
lem using a more physical language. Having identified 
the regularization parameter as a temperature, we can 
ask how the system behaves thermodynamically. The 
answer, we have suggested, is that the system exhibits 
ordered and disordered phases that can be interpreted as 
the good-fitting and ill-fitting regimes. We believe that 
this gives a much more intuitive picture than does the 
somewhat obscure probability analysis of the MEM. 

We close with a recapitulation of the main results. We 
have presented a new variant of the stochastic analytic 
continuation method that differs from Sandvik's original 
prescription as follows: as a matter of mathematical for- 
mulation, it includes an additional internal freedom that 
turns out to be equivalent to specifying a default model; 
as a matter of practical implementation, it is built on a 
delta function walker scheme and takes advantage of par- 
allel tempering. We have proved that the mean field ver- 
sion of this stochastic method is equivalent to the MEM. 
Our tests suggest that it outperforms the MEM for spec- 
tra with sharp features and fine structure. 
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APPENDIX A: STATISTICAL ERROR AND 
DISCRETIZATION 

In Eq. (|I2[1 , we have used notational shorthand to gloss 
over two subtle issues. First, we have ignored the fact 
that the statistical errors between G(r) and G(r') are 
not independent for r ^ r'. In general, the errors will 
be positively correlated whenever |r — t'| is sufficiently 
small. There is also a tendency for them to be nega- 
tively (positively) correlated over long-separated times 
since G(0~) = =fG(/3) is built in to the definition of the 
correlation function. Thus, one should more properly 
write the goodness-of-fit measure as 



= JpJ drdr'A(r)G- 1 (r,r')A(r'), 



(Al) 



where A(r) = / duj K(t, u)A{uj) - G(r) and G(t,t') is 
the covariance function for G(t). 

Second, we have ignored the discrete nature of the 
known input data. A Quantum Monte Carlo algorithm, 
for example, is used to generate stochastically a sequence 
of independent measurements {G^\ G^ 2 \ . . . , G( M '}, 
where each G'™* 1 is an (L-fl)-vector holding the val- 
ues of the single-particle propagator at imaginary times 
n = /31/L for 1 = 0, 1,...,L. 

The numerical measurement of the Green's function is 
accomplished by taking the average 



(to) 



m—1 

A I 



(A2) 



M i v 



GiGv 

M 

m—1 

The corresponding covariance matrix is given by 

M 



Civ 



l T — s Y.^r ) -G l ){G^-G ll ) 



M(M - I) 



rn—1 

M r 



m—1 



(A3) 



GiGi' — GiGf 



Equation (|A1|) must now be discretized in order to make 
use of Eqs. (|A2() and (|A3() . The imaginary time integrals 
are carried out numerically on a uniform mesh of L time 
slices (spaced by At = (3/L) according to the formula 



JO ,_ n 



(A4) 
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where // = /(At-/) and the Bode's rule weights wi satisfy 
12i=o w i = L- Equaton ijAlf) becomes 



1 L 

JJ2 E W ' A ' C !>'' A ''- 



(A5) 



1,1' =o 



Since A(0) = ±A(/3), 



L-l 



x 2 = 



V- 



(A6) 



l,l'=Q 



Here, = w; + SiflWL for / = 0, 1, . . . L — 1. 

We now want to solve for the unitary transformation J7 
that diagonalizes the covariance matrix. This allows us 
to write C = in terms of a set of statistically inde- 

pendent variances S = diag(cr£, a 2 , . . . , o\). The inverse 
matrix is C*" 1 = XJ t Yr x XJ . 

Putting C u } = £fc = o jsUuUkv into Eq. yields 



E 



a. 

fe=o k 



2 l L ^2u k iw l A i 



2=0 



(A7) 



— n ft 



fc=0 



where we have defined the matrix Vjy = Ukiwi/L. 

To recapitulate, the discretization of the r integration 
is implicit in Eq. Ijl2|l ; it also presumes that we are work- 
ing in the V basis in which the covariance matrix is di- 
agonal. 



APPENDIX B: MAXIMUM ENTROPY FORMAL 
SOLUTION 



Also, since 



S 2 S 



5A(x)SA(y) 



5{x - y) 
A(x) 



(B3) 



we find that S 2 S < 0. This means that the entropy 
functional is strictly non-positive and takes its maximum 
5 = when A is equal to the default model. 

Similar considerations for x 2 [^J allow us to construct 
the total variation in Q = x 2 — We find that 



SQ[A,fi] 
5A{uj) 



dr K(t, x)ip(r) 



— a 



In 



MM 



(B4) 



where 



i>(r)= dvK(T,v)A{v)-G{ T ). 



(B5) 



APPENDIX C: CONFIGURATION AL ENTROPY 

Consider a system of N energy levels with degenera- 
cies m p (p — 1,2, ...,N). Suppose that each level is 
filled with n p indistinguishable particles. The state of 
the system is unchanged by the rearrangement of parti- 
cles within a given level. Thus, given a set of occupancies 
{0 < n p < TO p }, the number of equivalent configurations 

is fl({n p }) = Y[ p ( mp ) anc l the entropy due to this con- 
figuration is 



lnft(KI) 



s5> 



(Cl) 



We want to examine the changes in S with variations 
in A(tu). Since the spectral function is subject the the 
normalization constraint J du) A(tu) = Af, variations in 
A(x) and A(y) for x ^ y are not independent. We can 
enforce the constraint by introducing a lagrange multi- 
plier r = 1 + afi. Let us define 

S[A,T} = - J diuA In (A/D) +T J du (A-D). (Bl) 

We have assumed here that D{u>) and A(uj) have the same 
normalization. 

Variations of the extended functional, Eq. (|B1I) . look 
like 



SS , ( Aix) 

= — In ' 



SA(x) 
dS 



D(x) 



afi 



dp 



(B2) 



a dw (A(uj) - D{uj)) 



There is a unique solution that causes these two equations 
to vanish: A(x) = D(x), fi — 0. This implies that S = 
and 5S = 0. 



The binomial coefficient 



m!/(m — n)!/n! can 



be approximated using Stirling's formula to! w to In? 
In the limit of small relative occupancy, this gives 



In 



to In to — (to — n) ln(m — n) — n In n 



-nlnn. 



(C2) 



Going over to the continuum, we make the identifica- 
tion 



1 E 

v 

TUr, 



dx 



oo 
n(x) 



(C3) 



and use the counting arguments above to write the en- 
tropy associated with each field configuration: 



InfJM 



dxn{x) lnn(x). 



(C4) 
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The total entropy is we must first discretize the derivative 

S = J Vn In fi [n] w In ft [n] . (C5) 



dU 
da 



1 U{a n+1 ) - U(a n ) 
e & n Aa 



APPENDIX D: DISCRETIZATION OVER A U{a n+1 ) - U(a n ) 

LOGARITHMIC MESH = a~^\nR : 

Suppose that we want to integrate a function /(a) 
known only at the points a n = R n a for n = 0, 1, . . . N . w hi c h leads to the integrals 
The integral identity 

daf(a)= fdae a f(e a ) (Dl) raN _ ± , d{J , 

da ( ~j ) (n(x)) a 



(D5) 



follows from the change of variables a — exp(a). In this 
basis, the known points describe a uniform mesh 

a n = In a n = In cto + n In R (D2) 

with spacing Aa = a n +i — a„ = lni?. Accordingly, 

N 



da J 



N-l 



[U(a n ) - U{a n+1 )] (n(x)} an (D6) 



and 



(da/(a)«^Aae Q V(e 5 ") 

^ n=0 



EMK/k). (D3) £" ^-(-g) -E[^K)-^k + o] 

= U(a p ) - U(a N ). 



N-l 

(D7) 



When the integrand is of the form 
dU 1 dU 



(D4) Equation (|53|) is simply the ratio of these two results. 

da e a da 
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